Learning to Predict Chemical Reactions

نویسندگان

  • Matthew A. Kayala
  • Chloé-Agathe Azencott
  • Jonathan H. Chen
  • Pierre Baldi
چکیده

Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. Approaches to the reaction prediction problems can be organized around three poles corresponding to: (1) physical laws; (2) rule-based expert systems; and (3) inductive machine learning. Previous approaches at these poles, respectively, are not high throughput, are not generalizable or scalable, and lack sufficient data and structure to be implemented. We propose a new approach to reaction prediction utilizing elements from each pole. Using a physically inspired conceptualization, we describe single mechanistic reactions as interactions between coarse approximations of molecular orbitals (MOs) and use topological and physicochemical attributes as descriptors. Using an existing rule-based system (Reaction Explorer), we derive a restricted chemistry data set consisting of 1630 full multistep reactions with 2358 distinct starting materials and intermediates, associated with 2989 productive mechanistic steps and 6.14 million unproductive mechanistic steps. And from machine learning, we pose identifying productive mechanistic steps as a statistical ranking, information retrieval problem: given a set of reactants and a description of conditions, learn a ranking model over potential filled-to-unfilled MO interactions such that the top-ranked mechanistic steps yield the major products. The machine learning implementation follows a two-stage approach, in which we first train atom level reactivity filters to prune 94.00% of nonproductive reactions with a 0.01% error rate. Then, we train an ensemble of ranking models on pairs of interacting MOs to learn a relative productivity function over mechanistic steps in a given system. Without the use of explicit transformation patterns, the ensemble perfectly ranks the productive mechanism at the top 89.05% of the time, rising to 99.86% of the time when the top four are considered. Furthermore, the system is generalizable, making reasonable predictions over reactants and conditions which the rule-based expert does not handle. A web interface to the machine learning based mechanistic reaction predictor is accessible through our chemoinformatics portal ( http://cdb.ics.uci.edu) under the Toolkits section.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Conditions of the Violations of Le Chatlier’s Principle in Gas Reactions at Constant T and P

Le Chatelier's principle is used as a very simple way to predict the effect of a change in conditions on a chemical equilibrium. . However, several studies have reported the violation of this principle, still there is no reported simple mathematical equation to express the exact condition of violation in the gas phase reactions. In this article, we derived a simple equation for the violation of...

متن کامل

Genome scale enzyme-metabolite and drug-target interaction predictions using the signature molecular descriptor

MOTIVATION Identifying protein enzymatic or pharmacological activities are important areas of research in biology and chemistry. Biological and chemical databases are increasingly being populated with linkages between protein sequences and chemical structures. There is now sufficient information to apply machine-learning techniques to predict interactions between chemicals and proteins at a gen...

متن کامل

A Machine Learning Approach to Predict Chemical Reactions

Being able to predict the course of arbitrary chemical reactions is essential to the theory and applications of organic chemistry. Previous approaches are not highthroughput, are not generalizable or scalable, or lack sufficient data to be effective. We describe single mechanistic reactions as concerted electron movements from an electron orbital source to an electron orbital sink. We use an ex...

متن کامل

No Electron Left Behind: A Rule-Based Expert System To Predict Chemical Reactions and Reaction Mechanisms

Predicting the course and major products of arbitrary reactions is a fundamental problem in chemistry, one that chemists must address in a variety of tasks ranging from synthesis design to reaction discovery. Described here is an expert system to predict organic chemical reactions based on a knowledge base of over 1500 manually composed reaction transformation rules. Novel rule extensions are i...

متن کامل

Neural networks for the prediction organic chemistry reactions

Reaction prediction remains one of the great challenges for organic chemistry. Solving this problem computationally requires the programming of a vast amount of knowledge and intuition of the rules of organic chemistry and the development of algorithms for their application. It is desirable to develop algorithms that, like humans, "learn" from being exposed to examples of the application of the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of chemical information and modeling

دوره 51 9  شماره 

صفحات  -

تاریخ انتشار 2011